Search CORE

26 research outputs found

Enhanced Productivity Using the Cray Performance Analysis Toolset

Author: Heidi Poxon
Luiz Derose
Publication venue
Publication date: 24/04/2020
Field of study

Abstract The purpose of an application performance analysis tool is to help the user identify whether or not their application is running efficiently on the computing resources available. However, the scale of current and future high end systems, as well as increasing system software and architecture complexity, brings a new set of challenges to todays performance tools. In order to achieve high performance on these peta-scale computing systems, users need a new infrastructure for performance analysis that can handle the challenges associated with multiple levels of parallelism, hundreds of thousands of computing elements, and novel programming paradigms that result in the collection of massive sets of performance data. In this paper we present the Cray Performance Analysis Toolset, which is set on an evolutionary path to address the application performance analysis challenges associated with these massive computing systems by highlighting relevant data and by bringing Cray optimization knowledge to a wider set of users

CiteSeerX

Supporting Relative Debugging for Large-scale UPC Programs

Author: Abramson David
Chao Jin
DeRose Luiz
Dinh Minh Ngoc
Gontarek Andrew
Moench Bob
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 31/12/2014
Field of study

AbstractRelative debugging is a useful technique for locating errors that emerge from porting existing code to new programming language or to new computing platform. Recent attention on the UPC programming language has resulted in a number of conventional parallel programs, for example MPI programs, being ported to UPC. This paper gives an overview on the data distribution concepts used in UPC and establishes the challenges in supporting relative debugging technique for UPC programs that run on large supercomputers. The proposed solution is implemented on an existing parallel relative debugger CCDB, and the performance is evaluated on a Cray XE6 system with 16,348 cores

Elsevier - Publisher Connector

An Implementation of the POMP Performance Monitoring Interface for OpenMP Based on Dynamic Probes

Author: Bernd Mohr
Luiz Derose
Seetharami Seelam
Publication venue
Publication date
Field of study

Abstract. OpenMP has emerged as the standard for shared memory parallel programming. Unfortunately, it does not provide a standardized performance monitoring interface, such that users and tools builders could write portable libraries for performance measurement of OpenMP programs. In this paper we present an implementation of a performance monitoring interface for OpenMP, based on the POMP proposal, which is built on top of DPCL, an infrastructure for binary and dynamic instrumentation. We also present overhead measurements of our implementation and show examples of utilization with two versions of POMP compliant libraries.

CiteSeerX

Data centric highly parallel debugging

Author: Abramson David
DeRose Luiz
Dinh Minh Nogoc
Kurniawan Donny
Moench Bob
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Debugging parallel programs is an order of magnitudemore complex than sequential ones, and yet, most parallel debuggers provide little extra functionality than their sequential counterparts. This problem becomes more serious as computational codes become more complex, involving larger data structures, and as the machines become larger. Peta-scale machines consisting of millions of cores pose a significant challenge for existing techniques. We argue that debugging must become more data-centric, and believe that "assertions" provide a useful model. Assertions allow a user to declare their expectations about the program state as a whole rather than focusing on that of only a single process state. Previously, we have implemented a special type of assertion that supports debugging applications as they evolve or are ported to different platforms. They allow a user to compare the state of one program against another reference version. These 'relative debugging' assertions, whilst powerful, pose significant implementation challenges for large peta-scale machines. In this paper we discuss a hashing technique that provides a scalable solution for very large problems on very large machines. We illustrate the scheme on 65k cores of Kraken, a Cray XT5 at the University of Tennessee. Copyright 2010 ACM

CiteSeerX

Crossref

University of Queensland eSpace

Compile-time Based Performance Prediction

Author: Calin Cascaval
Daniel A. Reed
David A. Padua
Luiz DeRose
Publication venue
Publication date
Field of study

In this paper we present results we obtained using a compiler to predict performance of scientific codes. The compiler, Polaris [3], is both the primary tool for estimating the performance of a range of codes, and the beneficiary of the results obtained from predicting the program behavior at compile time. We show that a simple compile-time model, augmented with profiling data obtained using very light instrumentation, can be accurate within 20% (on average) of the measured performance for codes using both dense and sparse computational methods

CiteSeerX

Assertion Based Parallel Debugging

Author: Abramson David
DeRose Luiz
Dinh Minh Ngoc
Jin Chao
Kurniawan Donny
Moench Bob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Programming languages have advanced tremendously over the years, but program debuggers have hardly changed. Sequential debuggers do little more than allow a user to control the flow of a program and examine its state. Parallel ones support the same operations on multiple processes, which are adequate with a small number of processors, but become unwieldy and ineffective on very large machines. Typical scientific codes have enormous multi-dimensional data structures and it is impractical to expect a user to view the data using traditional display techniques. In this paper we discuss the use of debug-time assertions, and show that these can be used to debug parallel programs. The techniques reduce the debugging complexity because they reason about the state of large arrays without requiring the user to know the expected value of every element. Assertions can be expensive to evaluate, but their performance can be improved by running them in parallel. We demonstrate the system with a case study finding errors in a parallel version of the Shallow Water Equations, and evaluate the performance of the tool on a 4,096 cores Cray XE6

Crossref

University of Queensland eSpace

A scalable parallel debugging library with pluggable communication protocols

Author: Abramson David
DeRose Luiz
Dinh Minh Ngo
Gontarek Andrew
Jin Chao
Moench Robert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Parallel debugging faces challenges in both scalability and efficiency. A number of advanced methods have been invented to improve the efficiency of parallel debugging. As the scale of system increases, these methods highly rely on a scalable communication protocol in order to be utilized in large-scale distributed environments. This paper describes a debugging middleware that provides fundamental debugging functions supporting multiple communication protocols. Its pluggable architecture allows users to select proper communication protocols as plug-ins for debugging on different platforms. It aims to be utilized by various advanced debugging technologies across different computing platforms. The performance of this debugging middleware is examined on a Cray XE Supercomputer with 21,760 CPU cores

Crossref

University of Queensland eSpace

Vertical profiling

Author: Amer Diwan
DeRose Luiz
DeRose Luiz A.
Dong
Matthias Hauswirth
Mellor-Crummey John
Michael Hind
Peter F. Sweeney
Reed Daniel A.
Sen Ashish
Seshadri Pattabi
Stephen
Sweeney Peter F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref